488 research outputs found
Liver lesion segmentation informed by joint liver segmentation
We propose a model for the joint segmentation of the liver and liver lesions
in computed tomography (CT) volumes. We build the model from two fully
convolutional networks, connected in tandem and trained together end-to-end. We
evaluate our approach on the 2017 MICCAI Liver Tumour Segmentation Challenge,
attaining competitive liver and liver lesion detection and segmentation scores
across a wide range of metrics. Unlike other top performing methods, our model
output post-processing is trivial, we do not use data external to the
challenge, and we propose a simple single-stage model that is trained
end-to-end. However, our method nearly matches the top lesion segmentation
performance and achieves the second highest precision for lesion detection
while maintaining high recall.Comment: Late upload of conference version (ISBI
Delving Deeper into Convolutional Networks for Learning Video Representations
We propose an approach to learn spatio-temporal features in videos from
intermediate visual representations we call "percepts" using
Gated-Recurrent-Unit Recurrent Networks (GRUs).Our method relies on percepts
that are extracted from all level of a deep convolutional network trained on
the large ImageNet dataset. While high-level percepts contain highly
discriminative information, they tend to have a low-spatial resolution.
Low-level percepts, on the other hand, preserve a higher spatial resolution
from which we can model finer motion patterns. Using low-level percepts can
leads to high-dimensionality video representations. To mitigate this effect and
control the model number of parameters, we introduce a variant of the GRU model
that leverages the convolution operations to enforce sparse connectivity of the
model units and share parameters across the input spatial locations.
We empirically validate our approach on both Human Action Recognition and
Video Captioning tasks. In particular, we achieve results equivalent to
state-of-art on the YouTube2Text dataset using a simpler text-decoder model and
without extra 3D CNN features.Comment: ICLR 201
Improving Facial Analysis and Performance Driven Animation through Disentangling Identity and Expression
We present techniques for improving performance driven facial animation,
emotion recognition, and facial key-point or landmark prediction using learned
identity invariant representations. Established approaches to these problems
can work well if sufficient examples and labels for a particular identity are
available and factors of variation are highly controlled. However, labeled
examples of facial expressions, emotions and key-points for new individuals are
difficult and costly to obtain. In this paper we improve the ability of
techniques to generalize to new and unseen individuals by explicitly modeling
previously seen variations related to identity and expression. We use a
weakly-supervised approach in which identity labels are used to learn the
different factors of variation linked to identity separately from factors
related to expression. We show how probabilistic modeling of these sources of
variation allows one to learn identity-invariant representations for
expressions which can then be used to identity-normalize various procedures for
facial expression analysis and animation control. We also show how to extend
the widely used techniques of active appearance models and constrained local
models through replacing the underlying point distribution models which are
typically constructed using principal component analysis with
identity-expression factorized representations. We present a wide variety of
experiments in which we consistently improve performance on emotion
recognition, markerless performance-driven facial animation and facial
key-point tracking.Comment: to appear in Image and Vision Computing Journal (IMAVIS
Revision in Continuous Space: Unsupervised Text Style Transfer without Adversarial Learning
Typical methods for unsupervised text style transfer often rely on two key
ingredients: 1) seeking the explicit disentanglement of the content and the
attributes, and 2) troublesome adversarial learning. In this paper, we show
that neither of these components is indispensable. We propose a new framework
that utilizes the gradients to revise the sentence in a continuous space during
inference to achieve text style transfer. Our method consists of three key
components: a variational auto-encoder (VAE), some attribute predictors (one
for each attribute), and a content predictor. The VAE and the two types of
predictors enable us to perform gradient-based optimization in the continuous
space, which is mapped from sentences in a discrete space, to find the
representation of a target sentence with the desired attributes and preserved
content. Moreover, the proposed method naturally has the ability to
simultaneously manipulate multiple fine-grained attributes, such as sentence
length and the presence of specific words, when performing text style transfer
tasks. Compared with previous adversarial learning based methods, the proposed
method is more interpretable, controllable and easier to train. Extensive
experimental studies on three popular text style transfer tasks show that the
proposed method significantly outperforms five state-of-the-art methods.Comment: Association for the Advancement of Artificial Intelligence. AAAI 202
Many-body mobility edge due to symmetry-constrained dynamics and strong interactions
We provide numerical evidence combined with an analytical understanding of
the many-body mobility edge for the strongly anisotropic spin-1/2 XXZ model in
a random magnetic field. The system dynamics can be understood in terms of
symmetry-constrained excitations about parent states with ferromagnetic and
anti-ferromagnetic short range order. These two regimes yield vastly different
dynamics producing an observable, tunable many-body mobility edge. We compute a
set of diagnostic quantities that verify the presence of the mobility edge and
discuss how weakly correlated disorder can tune the mobility edge further.Comment: 10 pages, 5 figure
Twin Networks: Matching the Future for Sequence Generation
We propose a simple technique for encouraging generative RNNs to plan ahead.
We train a "backward" recurrent network to generate a given sequence in reverse
order, and we encourage states of the forward model to predict cotemporal
states of the backward model. The backward network is used only during
training, and plays no role during sampling or inference. We hypothesize that
our approach eases modeling of long-term dependencies by implicitly forcing the
forward states to hold information about the longer-term future (as contained
in the backward states). We show empirically that our approach achieves 9%
relative improvement for a speech recognition task, and achieves significant
improvement on a COCO caption generation task.Comment: 12 pages, 3 figures, published at ICLR 201
Locally Regularized Neural Differential Equations: Some Black Boxes Were Meant to Remain Closed!
Implicit layer deep learning techniques, like Neural Differential Equations,
have become an important modeling framework due to their ability to adapt to
new problems automatically. Training a neural differential equation is
effectively a search over a space of plausible dynamical systems. However,
controlling the computational cost for these models is difficult since it
relies on the number of steps the adaptive solver takes. Most prior works have
used higher-order methods to reduce prediction timings while greatly increasing
training time or reducing both training and prediction timings by relying on
specific training algorithms, which are harder to use as a drop-in replacement
due to strict requirements on automatic differentiation. In this manuscript, we
use internal cost heuristics of adaptive differential equation solvers at
stochastic time points to guide the training toward learning a dynamical system
that is easier to integrate. We "close the black-box" and allow the use of our
method with any adjoint technique for gradient calculations of the differential
equation solution. We perform experimental studies to compare our method to
global regularization to show that we attain similar performance numbers
without compromising the flexibility of implementation on ordinary differential
equations (ODEs) and stochastic differential equations (SDEs). We develop two
sampling strategies to trade off between performance and training time. Our
method reduces the number of function evaluations to 0.556-0.733x and
accelerates predictions by 1.3-2x
- …